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Amendments to the Specification 

Please amend the title of the invention on the cover page and on page 1 as 
follows: SYSTEM AND METHOD FOR EXECUT I NG PRED I CATED CODE OUT O P 
ORDER SYSTEM AND METHOD FOR PREVENTING PREDICATED INSTRUCTIONS 
FROM DELAYING EXECUTION OF CONSUMER INSTRUCTIONS . 



@004 



Please amend the first full paragraph on page 2 as follows: 



One example of an advanced microarchitecture is that of a dynamic, or out-of- 
order, execution model. An out-of-orderT execution model i s, in genora l , is generally 
more complex than a static execution model. Static execution executes code In the 
order as scheduled statically by the compiler whi l e out of orde r , whereas out-of-order 
execution permits the processor to dynamically adjust instruction scheduling to the run- 
time behavior of the program. Because of this ability to adapt to the run-time 
environment, dynamic execution has been employed in many processor designs. The 
potential performance gains of an out - e^ p eto F out-of-order execution model are 
facilitated by two techniques: R e gist e r i) register renaming where registers are renamed 
to eliminate false dependencies : and ii) aftd-dynamic scheduling where instructions are 
reordered to reduce unnecessary stalls in the pipeline. 



Please amend the paragraph bridging pages 6-6 as follows: 



There are several types and variations of an out of order out-of-order or dynamic 
execution processors. A dynamic microarchitecture as a baseline performance 
embodiment is shown in Fig. 1. The baseline performance embodiment includes a 
dynamic portion 105 of the processor 100^ including a register renaming unit 110 , which 
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that maps between temporary and architectural files, a reorder buffer 120, a plurality of 
reservation stations 130, and a plurality of execution units 140. A bus 115 couples the 
register renaming unit 110, the reorder buffer 120, the plurality of resen/ation stations 
A^^130 and the plurality of execution units 140 together and to the remaining portions of 
the microprocessor which are not shown. The pipeline shown in Fig. 1 A has 15 stages, 
with 7 stages 155-161 devoted to the dynamic portion 105 of the processor 100. The 
dynamic pipeline 155-161 begins with a 2-stage rename 155-156, followed by a register 
read stage 157, a 2-stage schedule 158-159, an execute stage 150, and finally a retire 
stage 161 . In the schedule stage 1 58-1 59, the Instructions wait in the reservation 
stations 130 until the data of the source operands become available. After the data 
from the source operands are loaded into the register, the instruction enters the 
execute stage 150. In the final retire stage 161, the instnjctions are retired in order 
from the reorder buffer. 



Please amend the first full paragraph on page 8 as follows: 



The performance of a dynamic execution processor can degrade with the above 
described predicated code sequence. When a consumer instruction reaches the 
rename stage 155, 156, the renaming of the common register becomes ambiguous if 
the guarding predicates of the defining instmctions are not resolved. In the middle 190 
of Fig. 1 C, two add instructions, guarded by p9 and p3, assign their respective results to 
the same architectural register r40. After renaming 194, the result register is renamed 
to rB and rC, respectively. A nrwv instruction that uses or consumes the result register 
follows immediately in the pipeline. If the mov instruction enters the rename stage 
before predicates p9 and p3 are evaluated, then the processor cannot correctly 
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determine whether to rename r40 to physical registers rB or rC. Therefore, the 
t Pro^^s^^ stalls *6 consum e r mst ruction^ the mov instruction boforo ont e r i ng the mov 

I notruct l on before entering it into the rename stage. 



Please amend the first full paragraph on page 9 as follows: 
A consumer instmction is not required to wait for the resolution of all guarding 
predicates of the defining instructions as shown in Fig. 2A. The consumer instruction 
must only wait for the latest defining instruction that is guarded true. Therefore, the 
consumer instruction first waits for the predicate of the last of the defining instructions to 
become available 256. If the predicate of the last of the defining instructions turns out 
true 258, the consumer instruction can immediately advance in the pipeline 200 and, in 
this example, use the physical register of the last defining instruction, despite the 
outcome of other defining instructions. If the last defining instruction is not true \^ 
null i fi e d (i.e.. it is nullified) , then the consumer instruction must wait for the predicate of 
the second-to-last defining instruction 260. The process repeats until a latest defining 
instmction is guarded true. This prioritized checking scheme for the predicate values 
affects performance depending on the order those values become available. It will be 
further appreciated that the instructions represented by the blocl^s in Fig. 2A te are not 
required to be perfonmed in the order illustrated, and that all the processing represented 
by the blocks may not be necessary to practice the invention. 

Please amend the paragraph bridging pages 9 and 10 as follows: 

According to baseline perfonnance embodiment described above, the simple 
dynamic processor that runs predicated code could suffer from excessive pipeline stalls 
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due to scheduling and renaming issues as d e scr i b e d abovo . One alternative 
embodiment postpones the predicated instaictions down the pipeline and resolves the 
predicated instructions without significant change to the existing dynamic execution 
microarchitecture. 



Please amend the first full paragraph on page 10 as follows: 




For one embodiment, a select-^op addresses the issue of overlapping variable 
lifetimes, A select-jiop eliminates the ambiguity of renaming by effectively postponing 
the renaming task. Using the select-jiop r e duc e s the otall oyc l oo whi l o e n a b le enables 
renaming of registers without stalling the pipeline for disambiguating renaming , thus 
reducino the stall cvcles . A select-^top is a single-assignment form that guarantees that 
every target operand is uniquely defined by only one instruction. Thus, when a variable 
is defined in several basic blocks throughout a control flow graph, each definition 
instance of the variable is subscripted to be uniquely differentiated from other definition 
instances of the variable. If multiple definition instances of the variable reach a 
common use of the variable, then a consumer Inslmction cannot determine which of the 
subscripted variables to use. For one embodiment, the compiler Inserts a <|)-node as a 
special placeholder at the position where two definition instances merge. The two 
subscripted definition variables are used as the source operands of the new ([>-node, 
and a new subscripted variable is created as the new destination operand. From that 
point on, all subsequent uses of the variable are replaced with the new subscripted 
variable defined by the <|)-node. One embodiment of subscripting and inserting a if node 
Is illustrated in Fig. 3. 
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Please amend the second full paragraph on page 12 as follows: 
For one embodiment, the select-^xop has only one destination operand, and 
therefore the select-^op in theory can have numerous source operands as long as the 
large fan-ins of the source can be efficiently implemented. For one embodiment, the 
select-nop has four source operandSr: sO, s1 , s2t and s3. For altemative embodiments, 
more or less source operands could also be used. The source operands record 
physical register identifiers. Except for sO» each one of the source operands si , s2, and 
s3 is associated with two status bits, a v-bit and a p-bit. The status bits control the 
selection of the source operands. The first ono of the status bits, the v-bit, specifies 
whether the register is ready. The second status bit, the v-bit p-bit . indicates whether 
the renamed definition register has been architecturally committed. The operation of 
the status bits is explained in more detail below. 

Please amend the first full paragraph on page 1 3 as follows: 
For an embodiment having four source operands, the processor can encounter 
two, three, or four instructions that define register R before generating a select-jiop to 
resolve renaming ambiguity for register R. The generation of select-fiop is triggered by 
two conditions- First, each one of the defining instructions, except the first defining 
instruction* must be guarded by unresolved predicates. And second, because the first 
instrucfion defines the default identifier, the first instruction must be eitboF-one_pfthe 
following : An un-pnedicated instruction, G^a predicated instruction whose predicate has 
been resolved true, or a previously generated S8lect-|jop. 
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Please amend the first full paragraph on page 14 as follows: 




One embodiment Is a method 750 of processing predicated instructions as 
shown in Fig. 7A. First, rooolvlng a plurality of predicated instructions assigned to a 
common defined register in block 752 Is received . At least one of the predicated 
instructions is out of order in a dynamic pipeline. Next, In block 754, the destination 
register for each one of the predicated instructions is renamed. Then, the renamed 
destination register with the predicate register of the predicated instruction is assigned 
to the source operand of a select-pop, as shown in block 756. Next, a valid predicate is 
determined in block 758. The register corresponding to the select-pop that corresponds 
to the valid predicate is selected in block 760. A consumer instruction Is executed in 
block 762 wherein the consumer instruction uses the data from the register 
corresponding to the valid predicate. It will be further appreciated that the instructions 
represented by the blocks in Fig. 7A is are not required to be performed in the order 
illustrated, and that all the processing represented by the blocks may not be necessary 
to practice the invention. ^ 



Please amend the second full paragraph on page 15 as follows: 



For one embodiment, the select-^ops include use of a register alias table (RAT) 
with predicates. There are several approaches to support the generations of seiect- 
fiops as described above. For one embodiment, the RAT is augmented and used in the 
rename stage with predicates. The RAT is used by the renaming unit to map from 
architectural register Identifiers to physical register identifiers. When an in-flight 
Instruction enters the rename stage , the RAT looks up the physical identifiers of the 
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^j^O^^^rc® operands ao wol l ^ and assigns the result operand with a new physical 
identifier. 



Please amend the first full paragraph on page 16 as follows: 
For an alternative embodiment, a select-jxop is injected only when a select-|iop is 
required so as to avoid injecting excessive select'(iops. Ifije ctI n Q In this embodiment, 
iniection of a select-fiops is demand-driven, that is , and only occurs w hen more than 
one slot is occupied in the entry, plus when oithor of and one of the following conditions 
is met: 

I) the use of the register is encountered at the rename stage; 

ii) all slots In the entry are occupied and a new physical identifier is being 
allocated: or 

iii) one of the guarding predicates in the slots is re-<jefined. 
Th e u se of th e r e gist e r is e noountorod at - th e r e name stag e , 
Of 

A l l Q l oto I n tho entry oro oooupi e d and a new phys i ool rdontifi e r is b e ing al l oGato dr 
Of 

On e of tho guarding prod l o a t e s in th e s l ots is fo dofinod. 



Please amend the first full paragraph on page 19 as follows: 



The outcome of the variable opt is determined by an OR operation of condition 1 
and 2. However, for this embodiment, the source code was not fully rewritten for a 
more succinct control flow. Therefore condition 2 post-dominates condition 1 , since the 
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variable opt is assigned zero if condition 2 is true regardless of the outcome of condition 
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1 . Even though ttie reverse is also true in this embodiment I s . that opt i s noro if 
cond i tion 1 is truo d es pit e cond i t i on 2 (i.e.. the oot is assigned zero If condition 1 is true 
/ . regardless of the outcome of condition 2) , it does not necessarily translate the same in 
' ^^^^'^ cases. In the present embodiment the total number of cycles is 6. An 
embodiment more fully re\A^en for more succinct control flow can further reduce the 
execution process to 5 cycles. 
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